61 research outputs found

    An Extremal Inequality for Long Markov Chains

    Full text link
    Let X,YX,Y be jointly Gaussian vectors, and consider random variables U,VU,V that satisfy the Markov constraint Uβˆ’Xβˆ’Yβˆ’VU-X-Y-V. We prove an extremal inequality relating the mutual informations between all (42){4 \choose 2} pairs of random variables from the set (U,X,Y,V)(U,X,Y,V). As a first application, we show that the rate region for the two-encoder quadratic Gaussian source coding problem follows as an immediate corollary of the the extremal inequality. In a second application, we establish the rate region for a vector-Gaussian source coding problem where L\"{o}wner-John ellipsoids are approximated based on rate-constrained descriptions of the data.Comment: 18 pages, 1 figure. Submitted to Transactions on Information Theor

    Bias Correction with Jackknife, Bootstrap, and Taylor Series

    Full text link
    We analyze bias correction methods using jackknife, bootstrap, and Taylor series. We focus on the binomial model, and consider the problem of bias correction for estimating f(p)f(p), where f∈C[0,1]f \in C[0,1] is arbitrary. We characterize the supremum norm of the bias of general jackknife and bootstrap estimators for any continuous functions, and demonstrate the in delete-dd jackknife, different values of dd may lead to drastically different behaviors in jackknife. We show that in the binomial model, iterating the bootstrap bias correction infinitely many times may lead to divergence of bias and variance, and demonstrate that the bias properties of the bootstrap bias corrected estimator after rβˆ’1r-1 rounds are of the same order as that of the rr-jackknife estimator if a bounded coefficients condition is satisfied.Comment: to appear in IEEE Transactions on Information Theor

    Deconstructing Generative Adversarial Networks

    Full text link
    We deconstruct the performance of GANs into three components: 1. Formulation: we propose a perturbation view of the population target of GANs. Building on this interpretation, we show that GANs can be viewed as a generalization of the robust statistics framework, and propose a novel GAN architecture, termed as Cascade GANs, to provably recover meaningful low-dimensional generator approximations when the real distribution is high-dimensional and corrupted by outliers. 2. Generalization: given a population target of GANs, we design a systematic principle, projection under admissible distance, to design GANs to meet the population requirement using finite samples. We implement our principle in three cases to achieve polynomial and sometimes near-optimal sample complexities: (1) learning an arbitrary generator under an arbitrary pseudonorm; (2) learning a Gaussian location family under TV distance, where we utilize our principle provide a new proof for the optimality of Tukey median viewed as GANs; (3) learning a low-dimensional Gaussian approximation of a high-dimensional arbitrary distribution under Wasserstein distance. We demonstrate a fundamental trade-off in the approximation error and statistical error in GANs, and show how to apply our principle with empirical samples to predict how many samples are sufficient for GANs in order not to suffer from the discriminator winning problem. 3. Optimization: we demonstrate alternating gradient descent is provably not locally asymptotically stable in optimizing the GAN formulation of PCA. We diagnose the problem as the minimax duality gap being non-zero, and propose a new GAN architecture whose duality gap is zero, where the value of the game is equal to the previous minimax value (not the maximin value). We prove the new GAN architecture is globally asymptotically stable in optimization under alternating gradient descent

    Local moment matching: A unified methodology for symmetric functional estimation and distribution estimation under Wasserstein distance

    Full text link
    We present \emph{Local Moment Matching (LMM)}, a unified methodology for symmetric functional estimation and distribution estimation under Wasserstein distance. We construct an efficiently computable estimator that achieves the minimax rates in estimating the distribution up to permutation, and show that the plug-in approach of our unlabeled distribution estimator is "universal" in estimating symmetric functionals of discrete distributions. Instead of doing best polynomial approximation explicitly as in existing literature of functional estimation, the plug-in approach conducts polynomial approximation implicitly and attains the optimal sample complexity for the entropy, power sum and support size functionals

    Relations between Information and Estimation in Discrete-Time L\'evy Channels

    Full text link
    Fundamental relations between information and estimation have been established in the literature for the discrete-time Gaussian and Poisson channels. In this work, we demonstrate that such relations hold for a much larger class of observation models. We introduce the natural family of discrete-time L\'evy channels where the distribution of the output conditioned on the input is infinitely divisible. For L\'evy channels, we establish new representations relating the mutual information between the channel input and output to an optimal expected estimation loss, thereby unifying and considerably extending results from the Gaussian and Poisson settings. We demonstrate the richness of our results by working out two examples of L\'evy channels, namely the gamma channel and the negative binomial channel, with corresponding relations between information and estimation. Extensions to the setting of mismatched estimation are also presented

    Minimax Estimation of the L1L_1 Distance

    Full text link
    We consider the problem of estimating the L1L_1 distance between two discrete probability measures PP and QQ from empirical data in a nonasymptotic and large alphabet setting. When QQ is known and one obtains nn samples from PP, we show that for every QQ, the minimax rate-optimal estimator with nn samples achieves performance comparable to that of the maximum likelihood estimator (MLE) with nln⁑nn\ln n samples. When both PP and QQ are unknown, we construct minimax rate-optimal estimators whose worst case performance is essentially that of the known QQ case with QQ being uniform, implying that QQ being uniform is essentially the most difficult case. The \emph{effective sample size enlargement} phenomenon, identified in Jiao \emph{et al.} (2015), holds both in the known QQ case for every QQ and the QQ unknown case. However, the construction of optimal estimators for βˆ₯Pβˆ’Qβˆ₯1\|P-Q\|_1 requires new techniques and insights beyond the approximation-based method of functional estimation in Jiao \emph{et al.} (2015).Comment: to appear on IEEE Transactions on Information Theor

    Minimax Estimation of Discrete Distributions under β„“1\ell_1 Loss

    Full text link
    We analyze the problem of discrete distribution estimation under β„“1\ell_1 loss. We provide non-asymptotic upper and lower bounds on the maximum risk of the empirical distribution (the maximum likelihood estimator), and the minimax risk in regimes where the alphabet size SS may grow with the number of observations nn. We show that among distributions with bounded entropy HH, the asymptotic maximum risk for the empirical distribution is 2H/ln⁑n2H/\ln n, while the asymptotic minimax risk is H/ln⁑nH/\ln n. Moreover, Moreover, we show that a hard-thresholding estimator oblivious to the unknown upper bound HH, is asymptotically minimax. However, if we constrain the estimates to lie in the simplex of probability distributions, then the asymptotic minimax risk is again 2H/ln⁑n2H/\ln n. We draw connections between our work and the literature on density estimation, entropy estimation, total variation distance (β„“1\ell_1 divergence) estimation, joint distribution estimation in stochastic processes, normal mean estimation, and adaptive estimation

    On Estimation of LrL_{r}-Norms in Gaussian White Noise Models

    Full text link
    We provide a complete picture of asymptotically minimax estimation of LrL_r-norms (for any rβ‰₯1r\ge 1) of the mean in Gaussian white noise model over Nikolskii-Besov spaces. In this regard, we complement the work of Lepski, Nemirovski and Spokoiny (1999), who considered the cases of r=1r=1 (with poly-logarithmic gap between upper and lower bounds) and rr even (with asymptotically sharp upper and lower bounds) over H\"{o}lder spaces. We additionally consider the case of asymptotically adaptive minimax estimation and demonstrate a difference between even and non-even rr in terms of an investigator's ability to produce asymptotically adaptive minimax estimators without paying a penalty.Comment: To appear in Probability Theory and Related Field

    The Nearest Neighbor Information Estimator is Adaptively Near Minimax Rate-Optimal

    Full text link
    We analyze the Kozachenko--Leonenko (KL) nearest neighbor estimator for the differential entropy. We obtain the first uniform upper bound on its performance over H\"older balls on a torus without assuming any conditions on how close the density could be from zero. Accompanying a new minimax lower bound over the H\"older ball, we show that the KL estimator is achieving the minimax rates up to logarithmic factors without cognizance of the smoothness parameter ss of the H\"older ball for s∈(0,2]s\in (0,2] and arbitrary dimension dd, rendering it the first estimator that provably satisfies this property

    Minimax Rate-Optimal Estimation of Divergences between Discrete Distributions

    Full text link
    We study the minimax estimation of Ξ±\alpha-divergences between discrete distributions for integer Ξ±β‰₯1\alpha\ge 1, which include the Kullback--Leibler divergence and the Ο‡2\chi^2-divergences as special examples. Dropping the usual theoretical tricks to acquire independence, we construct the first minimax rate-optimal estimator which does not require any Poissonization, sample splitting, or explicit construction of approximating polynomials. The estimator uses a hybrid approach which solves a problem-independent linear program based on moment matching in the non-smooth regime, and applies a problem-dependent bias-corrected plug-in estimator in the smooth regime, with a soft decision boundary between these regimes.Comment: This version has been significantly revise
    • …
    corecore